6 research outputs found
E3: A Framework for Compiling C++ Programs with Encrypted Operands
In this technical report we describe E3 (Encrypt-Everything-Everywhere), a framework which enables execution of standard C++ code with homomorphically encrypted variables. The framework automatically generates protected types so the programmer can remain oblivious to the underlying encryption scheme. C++ protected classes redefine operators according to the encryption scheme effectively making the introduction of a new API unnecessary. At its current version, E3 supports a variety of homomorphic encryption libraries, batching, mixing different encryption schemes in the same program, as well as the ability to combine modular computation and bit-level computation
CoFHEE: A Co-processor for Fully Homomorphic Encryption Execution
The migration of computation to the cloud has raised privacy concerns as
sensitive data becomes vulnerable to attacks since they need to be decrypted
for processing. Fully Homomorphic Encryption (FHE) mitigates this issue as it
enables meaningful computations to be performed directly on encrypted data.
Nevertheless, FHE is orders of magnitude slower than unencrypted computation,
which hinders its practicality and adoption. Therefore, improving FHE
performance is essential for its real world deployment. In this paper, we
present a year-long effort to design, implement, fabricate, and post-silicon
validate a hardware accelerator for Fully Homomorphic Encryption dubbed CoFHEE.
With a design area of , CoFHEE aims to improve performance of
ciphertext multiplications, the most demanding arithmetic FHE operation, by
accelerating several primitive operations on polynomials, such as polynomial
additions and subtractions, Hadamard product, and Number Theoretic Transform.
CoFHEE supports polynomial degrees of up to with a maximum
coefficient sizes of 128 bits, while it is capable of performing ciphertext
multiplications entirely on chip for . CoFHEE is fabricated in
55nm CMOS technology and achieves 250 MHz with our custom-built low-power
digital PLL design. In addition, our chip includes two communication interfaces
to the host machine: UART and SPI. This manuscript presents all steps and
design techniques in the ASIC development process, ranging from RTL design to
fabrication and validation. We evaluate our chip with performance and power
experiments and compare it against state-of-the-art software implementations
and other ASIC designs. Developed RTL files are available in an open-source
repository
TREBUCHET: Fully Homomorphic Encryption Accelerator for Deep Computation
Secure computation is of critical importance to not only the DoD, but across
financial institutions, healthcare, and anywhere personally identifiable
information (PII) is accessed. Traditional security techniques require data to
be decrypted before performing any computation. When processed on untrusted
systems the decrypted data is vulnerable to attacks to extract the sensitive
information. To address these vulnerabilities Fully Homomorphic Encryption
(FHE) keeps the data encrypted during computation and secures the results, even
in these untrusted environments. However, FHE requires a significant amount of
computation to perform equivalent unencrypted operations. To be useful, FHE
must significantly close the computation gap (within 10x) to make encrypted
processing practical. To accomplish this ambitious goal the TREBUCHET project
is leading research and development in FHE processing hardware to accelerate
deep computations on encrypted data, as part of the DARPA MTO Data Privacy for
Virtual Environments (DPRIVE) program. We accelerate the major secure
standardized FHE schemes (BGV, BFV, CKKS, FHEW, etc.) at >=128-bit security
while integrating with the open-source PALISADE and OpenFHE libraries currently
used in the DoD and in industry. We utilize a novel tile-based chip design with
highly parallel ALUs optimized for vectorized 128b modulo arithmetic. The
TREBUCHET coprocessor design provides a highly modular, flexible, and
extensible FHE accelerator for easy reconfiguration, deployment, integration
and application on other hardware form factors, such as System-on-Chip or
alternate chip areas.Comment: 6 pages, 5figures, 2 table
RPU: The Ring Processing Unit
Ring-Learning-with-Errors (RLWE) has emerged as the foundation of many important techniques for improving security and privacy, including homomorphic encryption and post-quantum cryptography. While promising, these techniques have received limited use due to their extreme overheads of running on general-purpose machines. In this paper, we present a novel vector Instruction Set Architecture (ISA) and microarchitecture for accelerating the ring-based computations of RLWE. The ISA, named B512, is developed to meet the needs of ring processing workloads while balancing high-performance and general-purpose programming support. Having an ISA rather than fixed hardware facilitates continued software improvement post-fabrication and the ability to support the evolving workloads. We then propose the ring processing unit (RPU), a high-performance, modular implementation of B512. The RPU has native large word modular arithmetic support, capabilities for very wide parallel processing, and a large capacity high-bandwidth scratchpad to meet the needs of ring processing. We address the challenges of programming the RPU using a newly developed SPIRAL backend. A configurable simulator is built to characterize design tradeoffs and quantify performance. The best performing design was implemented in RTL and used to validate simulator performance. In addition to our characterization, we show that a RPU using 20.5mm2 of GF 12nm can provide a speedup of 1485x over a CPU running a 64k, 128-bit NTT, a core RLWE workloa
TREBUCHET: Fully Homomorphic Encryption Accelerator for Deep Computation
Secure computation is of critical importance to not only the DoD, but across financial institutions, healthcare, and anywhere personally identifiable information (PII) is accessed. Traditional security techniques require data to be decrypted before performing any computation. When processed on untrusted systems the decrypted data is vulnerable to attacks to extract the sensitive information. To address these vulnerabilities Fully Homomorphic Encryption (FHE) keeps the data encrypted during computation and secures the results, even in these untrusted environments. However, FHE requires a significant amount of computation to perform equivalent unencrypted operations. To be useful, FHE must significantly close the computation gap (within 10x) to make encrypted processing practical.
To accomplish this ambitious goal the TREBUCHET project is leading research and development in FHE processing hardware to accelerate deep computations on encrypted data, as part of the DARPA MTO Data Privacy for Virtual Environments (DPRIVE) program. We accelerate the major secure standardized FHE schemes (BGV, BFV, CKKS, FHEW, etc.) at >=128-bit security while integrating with the open-source PALISADE and OpenFHE libraries currently used in the DoD and in industry. We utilize a novel tile-based chip design with highly parallel ALUs optimized for vectorized 128b modulo arithmetic. The TREBUCHET coprocessor design provides a highly modular, flexible, and extensible FHE accelerator for easy reconfiguration, deployment, integration and application on other hardware form factors, such as System-on-Chip or alternate chip area
Accelerating Fully Homomorphic Encryption by Bridging Modular and Bit-Level Arithmetic
The dramatic increase of data breaches in modern computing platforms has
emphasized that access control is not sufficient to protect sensitive user
data. Recent advances in cryptography allow end-to-end processing of encrypted
data without the need for decryption using Fully Homomorphic Encryption (FHE).
Such computation however, is still orders of magnitude slower than direct
(unencrypted) computation. Depending on the underlying cryptographic scheme,
FHE schemes can work natively either at bit-level using Boolean circuits, or
over integers using modular arithmetic. Operations on integers are limited to
addition/subtraction and multiplication. On the other hand, bit-level
arithmetic is much more comprehensive allowing more operations, such as
comparison and division. While modular arithmetic can emulate bit-level
computation, there is a significant cost in performance. In this work, we
propose a novel method, dubbed bridging, that blends faster and restricted
modular computation with slower and comprehensive bit-level computation, making
them both usable within the same application and with the same cryptographic
scheme instantiation. We introduce and open source C++ types representing the
two distinct arithmetic modes, offering the possibility to convert from one to
the other. Experimental results show that bridging modular and bit-level
arithmetic computation can lead to 1-2 orders of magnitude performance
improvement for tested synthetic benchmarks, as well as one real-world FHE
application: a genotype imputation case study